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ABSTRACT 



A system for facilitating exchange of user information and 
opinion using automated collaborative filtering includes 
memory elements for storing item profiles and user profiles. 
The data contained in those profiles is used to calculate a 
number of similarity factors representing bow closely the 
preferences of one user correlate with another. The similarity 
factors are evaluated to select a set of neighboring users for 
each user which represents the set of users which most 
closely correlate with a particular user. The system assigns 
a weight to each one of the neighboring users. The system 
uses the ratings given to items by those neighboring users to 
recommend an item to a user. The system may be 
distributed, i.e. the system may include a number of nodes 
connected to a central server. The central server includes a 
memory element for storing user profile data and the nodes 
may be the type of system described above. 
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DISTRIBUTED SYSTEM FDR RVCILITATING a particular item as it would be perceived by a particular 

EXCHANGE OF USER INFORMATION AND user, since quality is inherently subjective. So, while a 

OPINION USING AUTOMATED content-based fihering scheme may select a number of items 

COLLABORATIVE FILTERING based on the content of those items, a content-based filtering 

CROSS RFRFRFNPP TO RFF ATPH ^ ^^^"^^ generally cannot further refine the list of selected 

^^°^*^™Vr:5^^?c '^^"^ recommend items that the individual will enjoy. 

APPLiCAnONS 

™ - ^. . ... . . SUMMARY OF THE INVENTION 

Inis appucation is a continuaUon-m-part apphcation of 
application Ser. No. 08/597,442 filed Feb. 2, 1996, now present invention relates to a system which collects a 

abandoned which itself claims priority to provisional apph- lO number of subjective ratings given to items by users. The 
cation Ser. No. 60/000^98, filed Jun. 30, 1995, now aban- described system allows users to provide ratings wherever 
doned and provisional apphcation 60/008,458, filed Dec. 11, whenever such provision is convenient for the user. For 

1995, DOW abandoned both ofwhich are now expired and arc example, a user may provide ratings for objects in the 
incorporated herein by reference. comfort and privacy of their own home via the internet, or 

15 a user may provide ratings at a retail establishment spedal- 
FIELD OF THE INVENTION izing in particular items. The system also allows the rating 

. . The present invention, relates .to a system for facilitating "^^"nation provided by the users to be used to recommend 
exchange of user information and opinion and, in particular! I. -^^i"^'' """"^ ^^""^ the Wr Wlbcate individuals 
to a distributed method for facilitating exchange of user ^avrng similar tastes. The system may also be used to allow 
information and opinion using automated coUaborative fil- ^^^"^g s™!*^ ^o commumcate with each other, 

tering. ^° aspect, the present invention relates to a system for 

facilitating exdiange of user information and opinion about 
BACKGROUND OF THE INVENTION items which includes memory elements for storing user 

Hie amount of information, as well as the mimber of P^^l^^s and item profiles. The sj^tcm also includes a cal- 
goods and services, available to individuals is increasing ^ for calculating smulanty factors between users and 

exponentially. This increase in items and information is ^ selector for sel«;tmg neighboring users for each user ba^^ 
occurring across all domains, e.g. sound recordings, on the simlanty factors. The system assigns a weight to each 
restaurants, movies. World Wide Web pages, clothing stores of the neighbormg users and uses the ratings given to 

etc. An individual attempting to find useful information, or '1^°^ neighboring users to recommend an item to 

to decide between competing goods and services, is often '° ^"^^ embodunents, the system includes a 

faced with a bewildering selection of sources and choices. co^^munication means that allows us^is to engage in dialog 
1 — T f 11 - • , witn each other and share mformation about items. In other 

H„^.1r °l p " " i?"^"^" embodinents. the system includes a user recommender 

aomam, may De impossiDie. t-or exampje sampling every ^^^^ refers users to other users based on the similaritv 
restaurant of a particular type m New York Q^r wou^d tax ^ fe^^is calculated by the system, 
even the most avid dmer. Such a sampling would most likely » *u i. • . , 

be prohibitively expensive to cany out, and the diner would ^° » distributed 

have to suffer through many unenjoyable restaurants. ^^^^ T""^! 7'°^''^^^'^. 

¥ J • • J- -J 1 t_ -11 , exchange of user information and opimon. The distributed 

Id many dom^ mdividuab have smiply leaned to .y^tem includes a central server which is comiectcd to a 

manage infcrmatK,n overload by relying on a form of « network and the server includes a memory element for 

genenc referral ^stem. For example, m the domain of spring user profile daU. At least one node is connected to 

movie ana soima recoraings, many mdivKluals rely on the network and the node includes a memory element for 

& fT"^'""' P^^r'!^' ^T""'"' ^^^^^ P"^fi^« registration information, a receiver for 
hZ Jl^ viewpointof one or ^^^^^^ ^istration information and a trans- 
no have a hkelihood of correlating with how the individual 45 mitter for transmitting the received user profile registration 
wm actually perceive the movie or sound recordmg. Many information to the central server. In some embodiiTnts, the 
mdmduals may rely on a review only to be disappointed „ode periodically tries to transmit user profile registn^tion 
when they acniaUy sample the item. information to the central server. The node may al^ include 
One method of attemptmg to provide an efficient filtering memory elements for storing user profiles and item profiles, 
mechamsm is to use content-based filtering. The contem- 50 a calculator for calculating simQarity factors between users 
based mter selects items from a domam for the user to of the distributed system, a selector for selecting a phirality 
sample based upon correlations between the content of the of neighboring users based on the calculated similarity 
Item and the u^»s preferences. Content-based filtering factors, a means for assigning a weight to each of those 
schem^ suffer from the drawback that the items to be neighboring users, and an item recommender for recom- 
selcctcd must be m some machine-readable form, or ss mending items to users based on ratings given to items by 
attributes describmg the content of the item must be entered the neighboring users and the weights assigned to those 
by hand. This makes content-based filtering problematic for neighboring users, 
existing items such as sound recordings, photographs, art, 

video, and any other physical item that is not inherendy DESCRIPTION OF THE DRAWINGS 

machine-readable. While item attributes can be assigned by eo invention is pointed out with particularity in the 

hand in order to allow a content-based seardi, for many appended claims. The above and further advantages of this 

domains of items such assignment is not practical. For invention may be better understood by referring to the 

example, it could take decades to enter even the most following description taken in conjunction with the accom- 

rudimentary attributes for all available network television panying drawings, in which: 

video clips by hand. 55 FIG. 1 is a flowdiart of one embodiment of the method; 

Perhaps more importanUy, even the best content-based FIG. 2 is a diagrammatic view of a user profile-item 

filtering schemes cannot provide an analysis of the quality of profile matrix; 
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FIG. 3 is a flowchart of another embodiment of the Alteraatively, a user profile may be provided as an array of 
method; pointers; each pointer is associated with an item rated by the 

FIG. 4 is a block diagram of an embodiment of the points to the rating and information associated with 

apparatus; tbc rating. 

FIG. 5 is a block diagram of an Internet system on which ^ ^ Profile for a user can be created and stored in a memory 
the method and apparatus may be used; element when that user first begins rating items, although in 

vir « , u^r^^. p * -u * j * muhi-domain applications user profiles may be created for 

y^^"^ f ' distnbuted system for particulardomaSsonly when the user begin^ to explore, and 

facihtatmg exchange of user mformation and opmion; ^ate items within, those domains. Alternatively, a uLr profile 

FIG. 7 is a flow chart of the steps taken to register a user; may be created for a user before the user rates any items in 

a domain. For example, a default user profile may be created 
FIG. 8 is a flow chart of the steps taken to verify whether for a domain which the user has not yet begun to explore 
an alias is in use. based on the ratings the user has given to items in a domain 

that the user has already explored. 
DETAILED DESCRIPTION OF THE Whenever a user profile is created, a number of initial 

INVENTION ratings for items may be solicited from the user. This can be 

As referred to in this description, items to be recom- providing the user with a particular set of items to 

■ mended'can beitems of any type that a user may samplein ^* • "^^^ corresponding to a particular group of items. Groups arc. . 
a domain. When reference is made to a "domain," it is ^^^^ ^® discussed below in more detail, 

intended to refer to any category or subcategory of ratable ,n - ^^ methods of solicitmg rating? from the user may 
items, such as sound recordings, movies, resUurants, vaca- '° ^^^^^^ LT> 'r7 f f '""^^tmg pans, in which the user 

tion destinations, novels, or World Wide Web pages. Refer- f.^i.^^^ v ^ ^ ""^u^T. ^^^-^^^^"^^ ^^^^ 

rir^nr ryr..^ f.. Tnn 1 , ' ^♦i, ^ lu J" * itcHis; solicitiug Tatiugs by datc of cutry mto the system, i.c., 

nng now to FIG. 1, a method for recommendmg items ^ rate the newest items added to ^e system 

begins by storing user and item mformaUon m profiles. soliciting raUngs for the items having the most rating^ or by 

ApluraHty of user profiles IS stored ma memory element 2s allowing a user to rate items similar to an initial item 

(step 102). One profile may be created for each user or selected by the user. In still other embodiments, the system 

multiple profiles may be created for a user to represent that may acquire a number of ratings by monitoring the user's 

user over multiple domains. Altematively, a user may be environment. For example, the system may assume that Web 

represented in one domain by multiple profiles where each sites for which the user has created '^bookmarks" arc liked 
profile represents the proclivities of a user in a given set of 30 that user and may use those sites as initial entries in the 

circumstances. For example, a user that avoids seafood user's profile. One embodiment uses all of the methods 

restaurants on Fridays, but not on other days of the week, described above and allows the user to select the particular 

could have one profile representing the user's restaurant method they wish to employ. 

preferences from Saturday through Thursday, and a second Ratings for items which are received from users can be of 

profile representing the user's restaurant preferences on 35 allows users to record subjective impressions 

Fridays. In some embodiments, a user profile represents of items based on their experience of the item. For example, 

more than one user. For example, a profile may be created ^^^^ ^^^f^ °° ^ alphabetic scale ("A" to "P^ or a 

which represents a woman and her husband for the purpose ?™encal scale (1 to 10). In one embodiment, ratings arc 

of selecting movies. Using this profile allows a movie ">tegcrs between 1 (lowest) and 7 (highest). Ratings can be 

recommendation to be given which takes into account the 40 ''''''''''^ ^ '^^''K^^ a stand-alone machine, for example, a 
«f *u • ^- *j, 1 a^-'-viuii 40 jjjjy ^j^jj^g mfonnatiou on a keyboard or a user 
movie tastes of both mdividuals. For convenience, the - * u n T 

«f -c *• -11 .1- . _H may enter such mformation via a touch screen. Ratmgs may 

remamder of this specification will use the term ^iser^ to : * *^ * * 1^ • 1 l 

«f , * - I r . » . ^ *^ received as mput to a system via elcctromc mail, by 

refer to angle users of the system as well as "composite telephone, or as input to a system via a local area or wide 

users. The memory element can be any memory element area network. In one embodiment, ratings are received as 

known m the art that is capable of stormg user profile data 45 input to a World Wide Web page. In this embodiment, the 

and allowing the user profiles to be updated, such as disc user positions a cursor on a World Wide Web page with an 

drive or random access memory. input device such as a mouse or trackball. Once the cursor 

Each user profile associates itenis with the ratings given is properly positioned, the user indicates a rating by using a 

to those items by the user. Each user profile may also store button on the input device to select a rating to enter. Ratings 

information in addition to the user's rating. In one so can be received from xisers singularly or in batches, and may 

embodiment, the user profile stores information about the be received from any number of users simultaneously, 
user, e.g. name, address, or age. In another embodiment, the Ratings can be inferred by the system from the user's 

user profile stores information about the rating, such as the usage pattern. For example, the system may monitor how 

time and date the user entered the rating for the item. User long the user views a particular Web page and store in that 

profiles can be any data consUnct that facilitates these 55 user's profile an indication that the user likes the page, 

associations, such as an array, although it is preferred to assuming that the longer the user views the page, the more 

provide user profiles as ^arse vectors of n-tuples. Each the user likes the page. Alternatively, a system may monitor 

n-tuple contains at least an identifier representing the rated the user's actions to determine a rating of a particular item 

item and an identifier representing the rating that the user for the user. For example, the system may infer that a user 

gave to the item, and may inchide any number of additional 60 likes an item which the user mails to many people and enter 

pieces of information regarding the item, the rating, or both. in the user's profile an indication that the user likes that item. 

Some of the additional pieces of information stored in a user More than one aspect of user behavior may be monitored in 

profile may be calculated based on other information in the order to infer ratings for that user, and in some embodiments, 

profile, for example, an average rating for a particular the system may have a higher confidence factor for a rating 

selection of items (e.g., heavy metal albums) may be cal- 65 which it inferred by monitoring multiple aspects of user 

culated and stored in the user's profile. In some behavior Confidence factors arc discussed in more detail 

embodiments, the profiles arc provided as ordered n-tuples. below. 
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Profiles for each item that has been rated by at least one object includes an interface for storing data to the physical 

user may also be stored in memory. Each item profile memory, an interface for retrieving data from the physical 

records how particular users have rated this particular item. memory, an interface for searching the physical memory 

Any data oonstmct that associates ratings given to the item and a Unk to another data object. In some embodiments the 

with the user assigmng ratmg can be used. It is preferred 5 data object is provided with "batch" capability, which will 

IS to provide item profiles as a sparse vector of n-tuples. described in detail below, 

bach n-tuple contains at least an identifier representing a m. • -r r 

particular user and an identifier representing the rating that mterfaces for stonng and retrieving profiles from a 
user gave to the item, and it may contain other information, Physical memory m:iplemeDt those functions in a physical 
as described above in connection with user profiles. As with °l^"*^0^-sPfcific manner. For example, a data object pro- 
user profiles, item profiles may also be stored as an array of ""^^"^^ ^ abstraction of a disk dnve memory would accept 
pointers. Item profiles may be created when the first rating ^ profile" or "retrieve profile" command from the 
is given to an item or when the item is first entered into the appropriate device driver commands to 
system. Alternatively, item profiles may be generated fi-om "^^^ ^ associated. These commands 
the user profiles stored in memory, by determining for each ^^^^^^ ^ ^"°P^^ translation of the "store profile" 
user, if that user has rated the item and, if so, storing the command received mto a *Vrite" command issued to the 
rating and user information in the item's profile. Item '^'^^ drive, or the data object may translate "store profile" 
-profilcsmay-be stored before user profiles' are stored.- after^ ..concunand mto a series of l*write"..comman^^^ to^the ^ 
user profiles are stored, or at the same time as user profiles. ^"^^ ^"f* ^"^^ "^^^^ retrieved firom the physical memory 
For example, referring to HG. 2, item profiled at a and user P^^"^^"* ^ ^^^^"^ interface for retrieving data, 
profile data may be stored as a matrix of values which ^® interfaces for storing and retrieving data may be 
provides user profile data when read "across," i.e. when provided as independent functions, dynamically loaded 
rows of the matrix are accessed, and provides item profile libraries, or subroutines within the object. It is only neces- 
data when read "down," Le. \Wien columns of the matrix are data object to access the underlying physical 
accessed. A data construct of this sort could be provided by 2$ ^^^^^ element to retrieve and store the data element, i.e. 
storing a set of user n-tuples and a set of item niuples. In profile, requested; the data object need not implement func- 
order to read a row of the matrix a specific user n-tuple is provided by the memory element unless it is desirable 
accessed and in order to read a column of the matrix a to do so. For example, a data object representing a cache 
specific item n-tuple is selected. memory need not implement functionality for retrieving 
The additional information associated with each item- 30 ^^^^ misses from main memory, although it may be desir- 
rating pair can be used by the system for a variety of ^^!^ implement a "cache flush" command in the data 
purposes, such as assessing the validity of the rating data. ^^^^^ ^^^^ xmderlying physical 
For example, if the system records the time and date the memory. 

rating was entered, or inferred from the user's environment, The data object includes an interface for searching the 
it can determine the age of a rating for an item. A rating 35 physical memory. The interface accepts one or more crite- 

which is very old may indicate that the rating is less valid ^on for screening data retrieved from the underlying physi- 

than a rating entered recently, for example, users* tastes may cal memory. For example, the system may instruct the data 

change or "drift" over time. One of the fields of the n-tuple object to retrieve all profiles having ratings for a particular 

may represent whether the rating was entered by the user or item in excess of "5." Alternatively, the system could 

inferred by the system. Ratings that are inferred by the 40 instruct the data object to return the profiles of all users 

system may be assumed to be less valid than ratings that are younger than 21. The data object receives the criterion and 

actually entered by the user. Other items of information may can accomplish the screen by accessing all the profile 

be stored, and any combination or subset of additional information stored in the associated physical memory, 

information may be used to assess rating validity. In some applying the requested criterion, and providing the system 

embodiments, this validity metric may be represented as a 45 with any profile that passes. Alternatively, the data object 

confidence factor, that is, ije combined effect of the selected could use some other algorithm for screening the data, such 

pieces of information recorded in the n-tuple may be quan- as mnning an SQL search on a stored table, or storing the 

tified as a number. In some embodiments, that number may profile data in a tree structure or hash table which allows the 

be expressed as a percentage representing the probability physical memory to be efiSdently searched, 

that the associated rating is incorrect or as an expected 50 The "criterion" feature just described is an explication of 

deviation of the predicted rating from the "correct" value. one of the advantages of the data object described. The 

Since the system may be hosted by any one of a number system does not need to ^)ecify physical memory addresses 

of different types of machines, or by a machine that is to access profile data. The system specifies a profile, or set 

reconfigured frequently, it is desirable to provide data stor- of profiles, it deares to transfer by reference to profile 

age for profiles in a hierarchical, isolated manner. The term 55 information. For example, the data object accepts desired 

"isolated," for the purposes of this specification, means that profile information from the system (which includes name 

the interface to the physical memory elements storing item data, some item of demographic information, rating 

and user profiles is abstracted, i.e. the system interacts with information, or some set of this infonnation) and imple- 

ihe physical memory elements through a defined data object. ments the physical memory transfer. 

Although the description of such a data object is couched in 60 The link identifies another data object to be accessed if the 

terms of profile data and the associated system for recom- data request cannot be satisfied by the underlying physical 

mending items to users, the data object can be used in any memory. For example, a data object representing random 

system reqmnng that access to data be provided in an access memory may be accessed to retrieve user profiles 

isolated, hierarchical manner, such as databases or distrib- having a state address equal to "Massachusetts." If no user 

uted file systems. 65 profiles stored in the underlying physical memory match the 

Adata object of the sort described provides an abstraction provided criterion, the link, which identifies another data 

of a physical memory in which profiles are stored. The daU object, is folbwed. If the link identifies another data object. 
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i.e. if the link is not a nuU pointer, the system attempts to 
fulfill its request from the data object identified by the link. 
If, in turn, the request cannot be satisfied by the second- 
identified data object, and the second-identified data object 
is linked to a third data object, the system attempts to fulfill 
its request from the third-identified data object. This process 
continues until a "null" link is encountered. 

The link can be used to arrange the data objects into a 
hierarchy which corresponds to the order in which the 
system accesses memory. For example, the system may be 
provided with a "cache" data object that is linked to a "main 
memory** data object, which is in turn linked to a "disk** 
memory object that is itself linked to a "network." Thus, a 
system would issue a "retrieve profile" request to the 
"cache" data object with a criterion of "name=john_smith". 
If the cache memory is unable to satisfy this request, it is 
presented to the next data object in the hierarchy, ix. the 
"main memory", data object. If the retjuest, is.satisfied from 
main memory, the user profile is returned to the cache, which 
can then satisfy the data request. The hierarchy of data 
objects provided by the links can be set up once for a given 
system or the links may be dynamically rearranged. If the 
links are set up in a static fashion, they may be specified by 
a configuration file or, in some applications, the links may be 
hardcoded. Dynamic reconfiguration of the links provides a 
S3^tem with the ability to reconfigure its memory hierarchy 
in response to run-time failures, e.g. a hard drive crash. 

When a lower-level data object in the hierarchy satisfies 
a request that was not able to be fulfilled by a higher-level 
data object in the hierarchy, the lower-level object returns 
the result to the next higher-level data object The higher- 
level data object writes the result into its underlying physical 
memory, and returns the result to another higher-level data 
object, if necessary. In this manner, memory may be 
accessed in a hierarchical, isolated manner and data can be 
transparently distributed to the most efficient level of 
memory. 

In some embodiments it may be desirable to provide a 
data object with "batch" capability, i.e. the data object will 
retrieve more data than requested in an attempt to increase 
performance. This capability may be provided as a flag that, 
when set, indicates that the data object should retrieve more 
data than requested. Alternatively, the data object may be 
provided with a function or subroutine which indicates to the 
data object when and how much should be retrieved in 
various situations, or the data object may accept input (e.g. 
in the form of a passed parameter) from the system instruct- 
ing it to initiate a batch transfer For example, a data object 
may be provided with logic that examines requests and, if 
the request is one for a user profile, initiates an access of four 
user profiles. The amount and frequency of such "look- 
ahead" memory accessing may be varied in order to advan- 
tageously take advantage of physical memory 
characteristics, such as latency and size. 

Whether a hierarchical, isolated data store such as the one 
described above is provided or not, the user profiles arc 
accessed in order to calculate a similarity factor for each user 
with respect to all other users (step 104). A similarity factor 
represents the degree of correlation between any two users 
with respect to a set of items. The calculation to be per- 
formed may be selected such that the more two users 
correlate, the closer the similarity factor is to zero. Special- 
ized hardware may be provided for calculating the similarity 
factors between users, although it is prefenred to provide a 
general-purpose computer with appropriate programming to 
calculate the similarity factors. 

Whenever a rating is received from a user or is inferred by 
the system from that user^s behavior, the profile of that user 
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may be updated as well as the profile of the item rated. 
Profile updates may be stored in a temporary memory 
location and entered at a convenient time or profiles may be 
updated whenever a new rating is entered by. or inferred for 
5 that user. Profiles can be updated by appending a new 
n-tuple of values to the set of already existing n-tuples in the 
profile or, if the new rating is a change to an existing rating, 
overwriting the appropriate entry in the user profile. Updat- 
ing a profile also requires re-computation of any profile 
entries that are based on other information in the profile. 

Whenever a user's profile is updated with new rating-item 
n-tuple, new similarity factors between the user and other 
users of this system may be calculated. In other 
embodiments, similarity factors are periodically 
recalculated, or recalculated in response to so me other 
stimulus, such as a change in a neighboring user's profile. 
The similarity factor for a user may be calculated by 
comparing that user's profile- with the profile of every other 
user of the system. This is computationally intensive, since 
the order of computation for calculating similarity factors in 
this manner is n^, where n is the nimiber of users of the 
system. It is possible to reduce the computational load 
associated with re-calculating similarity factors in embodi- 
ments that store item profiles by first retrieving the profiles 
^ of the newly-rated item and determining which other users 
have already rated that item. The similarity factors between 
the newly-rating user and the users that have already rated 
the item are the only similarity factors updated. 
Any number of methods can be used to calculate the 
Q similarity factors. In general, a method for calculating 
similarity factors between users should minimize the devia- 
tion between a predicted rating for an item and the rating a 
user would actually have given the item. 

It is also desirable to reduce error in cases involving 
35 "extreme" ratings. That is, a method which predicts fairly 
well for item ratings representing ambivalence towards an 
item but which docs poorly for item ratings representing 
extreme enjoyment or extreme disappointment with an item 
is not useful for rccoaunending items to users. 
40 Similarity factors between users refers to any quantity 
which expresses the degree of correlation between two 
user's profiles for a particular set of items. The following 
methods for calculating the similarity factor are intended to 
be exemplary, and in no way exhaustive. Depending on the 
45 item domain, different methods will produce optimal results, 
since users in different domains may have different expec- 
tations for rating accuracy or speed of recommendations. 
Different methods may be used in a single domain, and, in 
some embodiments, the system allows users to select the 
50 method by which they want their similarity factors pro- 
duced. 

In the following description of methods, D^^ represents 
the similarity factor calculated between two users, x and y. 
represents the rating given to item i by user x, I 
55 represents all items in the database, and C^, is a Boolean 
quantity which is 1 if user x has rated item i and 0 if user x 
has not rated that item. 

One method of calculating the similarity between a pair of 
users is to calculate the average squared difference between 
60 their ratings for mutually rated items. Thus, the similarity 
factor between tiser x and user y is calculated by subtracting, 
for each item rated by both users, the rating given to an item 
by user y from the rating given to that same item by user x 
and squaring the difference. The squared differences are 
65 summed and divided by the total number of items rated. This 
method is represented mathematically by the following 
expression: 
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A similar method of calculating the similarity factor 
between a pair of users is to divide the sum of their squared 
rating differences by the number of items rated by both users 
raised to a power. This method is represented by the fol- 
lowing mathematical expression: 



where |C^| represents the number of items mutually rated by 
usets X and y. 

In another embodiment, the similarity factor between two 
users is a Pearson r correlation coefi&cient Alternatively, the 
similarity factor may be calculated by constraining the 
correlation coefficient with a predetermined average rating 
value, A Using the constrained method, the correlation 
coefiBcient, which represents D^^,, is arrived at in the follow- 
ing manner. For each item rated by both users, A is sub- 
tracted from the rating given to the item by user x and the 
rating given to that same item by user y. Those differences 
are then multipUed. The sununed product of rating differ- 
ences is divided by the product of two sums. The first sum 
is the sum of the squared differences of the predefined 
average rating value. A, and the rating given to each item by 
user X. The second sum is the sum of the squared differences 
of the predefined average value. A, and the rating given to 
each item by user y. This method is expressed mathemati- 
cally by: 
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' where' [C;^ repries^nts'the* number "bf itenis rated by' both 
users. 

A third method for calculating the similarity factor 
between users attempts to factor into the calculation the 
degree of profile overlap, i.e. the number of items rated by 
both users compared to tiie total number of items rated by 
either one user or the other. Thus, for each item rated by both 
users, the rating given to an item by user y is subtracted firom 
the rating given to that same item by user x. These differ- 
ences are squared and then summed. The amount of profile 
overlap is taken into account by dividing the sum of squared 
rating differences by a quantity equal to the number of items 
mutually rated by the users subtracted from the sum of the 
number of items rated by user x and the number of items 
rated by users y. This method is expressed mathematically 
by: 



30 



where represents all items rated by x, represents all 
items rated by y, and represents all items rated by both 
X and y. 



The additional information included in a a-tuple may also 
be used when calculating the similarity factor between two 
users. For example, the information may be considered 
separately in order to distinguish between users, e.g. if a user 
tends to rate items only at night and another user tends to rate 
items only during the day, the users may be considered 
dissimilar to some degree, regardless of the fact that they 
may have rated an identical set of items identically. 
Alternatively, if the additional information is being used as 
a confidence factor as described above, then the information 
may be used in at least two ways. 

In one embodiment, only item ratings that have a confi- 
dence factor above a certain threshold are used in the 
methods described above to calculate similarity factors 
between users. 

In a second embodiment, the respective confidence fec- 
toire associated with ratings in each user's profile may be 
factored into each rating comparison" For e~xarnple,'if a first" 
user has given an item a rating of "7" which has a high 
20 confidence factor, but a second user has given the same item 
a rating of "7" with a low confidence factor, the second 
user's rating may be "discounted." For example, the system 
may consider the second user as having a rating of "4" for 
the item instead of "7." Once ratings are appropriately 
25 "discounted", similarity factors can be calculated using any 
of the methods described above. 

Regardless of the method used to generate them, or 
whether the additional information contained in the profiles 
is used, the similarity factors are used to select a plurality of 
users that have a high degree of correlation to a user (step 
106). These users are called the user's "neighboring users." 
A user may be selected as a neighboring user if that user's 
similarity factor with respea to the requesting user is belter 
than a predetermined threshold value, L. The threshold 
value, L, can be set to any value which improves the 
predictive capability of the method. In general, the value of 
L will change depending on the method used to calculate the 
similarity factors, the item domain, and the size of the 
nimiber of ratings that have been entered. In another 
40 embodiment, a predetermined number of users are selected 
from the users having a similarity factor better than L, e.g. 
the top twenty-five users. For embodiments in which con- 
fidence factors are calculated for each user-user similarity 
factor, the neighboring users can be selected based on having 
45 both a thre^old value less than L and a confidence factor 
higher than a second predetermined threshold. 

In some embodiments, users are placed in the rating user's 
neighbor set based on considerations other than the similar- 
ity, factor between the rating user and the user to be added 
to the set. For example, the additional information associ- 
ated with item ratings may indicate that whenever user A has 
rated an item highly. User B has sanopled that item and also 
liked it considerably. The system may assume that User B 
enjoys following the advice of User A, However, User A 
may not be selected for User B's neighbor set using the 
methods described above due to a number of reasons, 
including that there may be a number of users in excess of 
the threshold, L, which highly correlate with User B's 
profile. These highly correlated users will fill up User B's 
neighbor set regardless of their use in recommending new 
items to User B. 

Alternatively, certain users may not be included in a 
neighbor set because their contribution is cimiulative. For 
example, if a user's neighbor set already includes two users 
that have rated every Dim Simi restaurant in Boston, a third 
user that has rated only Dim Sum restaurants in Boston 
would be cumulative, regardless of the similarity factor 
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calculated for that user, and another user who has rated of items, a large number of users, or both. In general, the 

different items in a different domain may be included profile matrix shown in FIG. 2 wiU be a sparse matrix for 

"^ff ... data sets having a large number of items or users. Since, as 

Another emtodiment m which iieighbors may be diosen described above, it is desirable to reduce compuUtional load 
n°ir ZfitT^ 5 on the system by first accessing item profiles to detemiine a 

^^HniTf T -^'^'"^ 8^: '° set of uLis thai have rated the item, the matrix of HG. 2 

settings, a user may desire to explore a new domain of items. j ^.^^^^.^a ; ^ i- u ii 

However, the user^s neighbors may not have explored that ^"^^ ^ ^ ^ ^""^ ^^^y^' P^f^^ 

domain sufficiently to provide the user with adequate rec- T! determine if the user represented by 

ommendations for items to sample. In this situaVioD, users '^"^ ^ '^^"^ * users that have rated the 

may be selected for the exploring user's neighbor set based ^° '^^^ '^''^'^ generated, and that list of users would 

on various factors, such as the number of items they have aetermme which of the newly-rating user's similarity factors 

rated in the domain which the user wants to explore. This should be updated. Alternatively, an item column could be 

may be done on the assumption that a user that has rated accessed to determine which user's have rated the item and, 

many items in a particular domain is an experienced guide therefore, which of the newly-rating user's similarity factors 

to that domain. 15 must be updated 

Auser's neighboring user set should be updated each time 1° systems servicing a large number of users, however, 

that a new rating is entered by, or infeired for, that user. In contention for profile matrix data can become acute. Hiis 

many applications it is desirable to reduce the ainount of results from the retrieval pattens of the similarity factor 

computation required to maintain the appropriate set of algorithms described above. First, an item profile is accessed 

neighboring users by limiting the number of user profiles 20 to determine vAiich users have rated an item. Once the users 

consulted to create the set of neighboring users. In one that have previously rated the item are determined, each of 

embodiment, instead of updating the similarity factors their user profiles must be accessed so that the simQarity 

between a rating user and every other user of the systwn factor between the newly-rating user and each of the 

(which has computational order of n^), only the similarity previously-rating users can be calculated. If the profile data 

factors between the rating user and the rating user's 25 is provided only as a set of user n-tuples, the first step of 

neighbors, as well as the similarity factors between the accessing item profiles is not efficient, since each user 

rating user and the neighbors of the rating user's neighbors n-tuple must be accessed to generate a list of users that have 

are updated. This limits the number of user profiles which rated an item. SimUarly, if the profile data is provided only 

must be compared to m^ minus any degree of user overlap as a set of item n-tuples, then the next step of accessing user 

between the neighbor sets where mis a number smaller than 30 profiles is inefficient, since each item profile must be 

n. In this embodiment, similar users are selected in any accessed to determine which users have rated the item, 

manner as described above, such as a similarity factor In order to efficiently service a system having a large 

threshold, a combined similarity factor-confidence factor number of items or a large number of users, it is desirable to 

threshold, or solely on the basis of additional information store both a set of user n-tuples and a set of item n-tuples. 

contained in user profiles. 35 User n-tuples are accessed whenever information related to 

Once a set of neighboring users is chosen, a weight is how the user has rated items in the domain is required, and 

assigned to each of the neighboring users (step 108). Li one item n-tuples arc accessed whenever information related to 

embodiment, the weights are assigned by subtracting the how users have rated the item is required. This also allows 

similarity factor calculated for each neighboring user fi-om the item profile data to be accessed concurrently from the 

the threshold value and dividing by the threshold value. This 40 user profile data. As noted above, the n-tuples may store 

provides a user weight that is higher, ie. closer to one, when rating information or they may store pointers to rating 

the similarity factor between two users is smaller. Thus, information. 

similar users are weighted more heavily than other, less In some embodiments it is useful to store the respective 

similar, users. In other embodiments, the confidence factor sets of n-tuples on separate servers in order to provide a 

can be used as the weight for the neighboring users. Users 45 degree of fault tolerance. In order to further increase 

that are placed into a neighbor set on the basis of other efficiency, user n-mples may be stored on first collection of 

infomaation, i.e. "reputation" or experience in a particular servers which act as a distributed, shared database for user 

domain, may have an appropriate weight selected for them. n-tuples and item n-tuples may be stored on a second 

For example, if a user is seleaed because of their experience collection of servers which act as a diared, distributed 

with a particular domain, that user may be weighted very so database for item n-tuples. 

highly since it is assumed that they have much experience An example of how such a system would operate follows, 

with the items to be recommended. The weights assigned to A first user submits a rating for a first item. The new rating 

such users may be adjusted accordir^ to enhance the information is stored both in the user's n-mple and the item's 

recommendations given to the user n-tuple. In order to update the first user's similarity factors. 

Once weights are assigned to the neighboring users, an 55 the system accesses that item's profile and determines that 

item is recommended to a user (step 110). For applications 3,775 other users of the system have also rated that item. The 

in which positive item recommendations arc desired, items system begins updating the first user's similarity factors by 

are recommended if the user's neighboring users have also retrieving the first user's profile as well as the profile of one 

rated the item highly. For an application desiring to warn of the 3,775 users of the system that have already rated the 

users away from items, items are displayed as recommended 60 item. The updated similarity factor between these two users 

against when the user's neighboring users have also given is calculated using any of the methods described above, 

poor ratings to the item. Once again, although specialized While the system is updating the first user's similarity 

hardware may be provided to select and weight neighboring factors, a second user submits a rating for a second item. The 

users, an appropriately programmed general-purpose com- system stores the new rating infonnation in both the second 

puter may provide these functions. 65 user's o-mple as well as the second item's n-mple, and 

Referring to both FIGS. 1 and 2, the method just described accesses the second item's profile. This can be done simul- 

can be further optimized for data sets having a large number taneously with the system accessing another user profile. 
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because the data is stored as separate sets of n-tuples, as The predetermined number of items to recommend can be 

described above. selected such that those items having the highest predicted 
While the system is calculating the new similarity factors rating are recommended to the user or the predetermined 
for the first two users, the system determines that similarity number of items may be selected based on having the lowest 

factors for a third user need to be updated. When the system 5 predicted rating of all the items. Alternatively, if a system 
attempts to access the item profiles to determine other users has a large number of items from which to select items to 

to use in calculating similarity factors, however, the system recommend, confidence factors can be used to limit the 

is unable to access them because the server hosting the item amount of computation required by the system to generate 

profile data has crashed. The system redirects its request for recommendation. For example, the system can select the 

the item profiles to the server hosting the user n-tuple data, lo first predetermined number of items that are highly rated by 

This allows the system to continue operation, even though the user's neighbors for which the confidence faaor is above 

this method of generating the item profile information is less a certain threshold. 

eflScient. As noted above, multiple servers may host user or Recommendations can take any of a number of forms. For 

item n-tuples in order to minimize the firequency of this example, reconunended items may be output as a list, either 

occurrence. IS printed on paper by a printer, visually displayed on a display 

Concept information may also be used to generate item- screen, or read aloud, 

item similarity metrics, which are used to respond to a user The user may also select an item for which a predicted 

request to identify other items that are similar to an' item tHe rating is desired? A rating that the user would assign "to the 

user has sampled and enjoyed. Since each item has a concept item can be predicted by taking a weighted average of the 

mask which identifies to which concepts it belongs, item- 20 ratings given to that item by the user's neighboring users, 

item similarity metrics may be generated responsive to item Information about the recommended items can be dis- 

concept mask overlaps. For example, if each of two items played to the user. For example, in a music domain, the 

belong to five concepts, and two of the five concepts are system may display a list of recommended albums inchiding 

overlapping, ie. both items belong to those two, a degree of the name of the recording artist, the name of the album, the 

item overlap may be calculated by dividing the number of 25 record label which made the album, the producer of the 

overlapping concepts, in this example two, by the total album, "hit" songs on the album, and other information. In 

number of concepts to which both item belong, in this the embodiment in which the user selects an item and a 

example 10. The actual method of arriving at a value for rating is predicted for that item, the system may display the 

item concept mask overiap will vary depending on various actual rating predicted, or a label representing the predicted 

factors such as domain, number of items, number of 30 rating. For example, instead of displaying 6.8 out of a 

concepts, and others. possible 7.0 for the predicted rating, a system may instead 

Another method for generating an item-item similarity display "highly recommended". Embodiments in which a 

metric is to determine the similarity of ratings given by users confidence factor is calculated for each prediction may 

to both items. In general, rating similarity is determined display that information to the user, either as a number or a 

using the same tedmiques as described above in relation to 35 label. For example, the system may display "highly 

similarity factors for each user that has rated both items. The recommended — 85% confidence" or it may di^lay "highly 

item-item opinion similarity metric may be a single number, recommended — very sure," 

as described above in relation to automated collaborative In one embodiment, items arc grouped in order to help 

filtering, or it may be concept-based, i.e. an item may have predict ratings and increase recommendation certainty. For 

an item-item opinion similarity metric which consists of a 40 example, in the broad domain of music, recordings may be 

vectorofsimilarity factors calculated on a per-concept basis. grouped according to various genres, such as "opera," 

In other embodiments both the concept overiap metric and "pop,** "rock," and others. Groups, or "concepts," are used 

the opinion similarity metric may be used together, generally to improve performance because predictions and recommen- 

in any manner that further refines the accuracy of the dations for a particular item may be made based only on the 

recommendation process. The item to be recommended may 45 ratings given to other items within the same group. Groups 

be selected in any fashion, so long as the ratings of the may be determined based on information entered by the 

neighboring users, their assigned weights, and the confi- users, however it is currently preferred to generate the 

dence factors, if any, arc taken into account. In one groups using the item data itself. 

embodiment, a rating is predicted for each item that has not Generating the groups using the item data itself can be 

yet been rated by the user. This predicted rating can be 50 done in any manner which groups items together based on 

arrived at by taking a weighted average of the ratings given some differentiating feature. For example, in the item 

to those items by the user's neighboring users. A predeter- domain of music recordings, groups could be generated 

mined number of items may then be reconunended to the corresponding lo "pop," "opera," and others, 

user based on the predicted ratings. A particular way to generate groups begins by randomly 

Recommendations may also be gMierated using the addi- 55 assigning all items in the database to a number of groups, 

tional information associated with the user ratings or the The number of desired groups can be predetermined or 

confidence factors associated with the similarity factors random. F or each initial group, the centroid of the ratings 

calculated between a user and the user's neighbors. For for items assigned to that group are calculated. This can be 

example, the additional information may be used to discount done by any method that determines the approximate mean 

the rating given to items. In this embodiment, the additional 60 value of the spectrum of ratings contained in the item 

information may indicate that a rating is possibly invalid or profiles assigned to the initial group, such as eigenanalysis. 

old, and could result in that rating being weighted less than It is currently preferred is to average all vahies present in the 

other ratings. The additional information may be expressed initial group. 

as a confidence factor and, in this embodiment, items are After calculating the group centroids, determine to which 

recommended only if the user's neighboring user both 6S group centroid each item is closest, and move it to that 

recommends them highly and there is a high confidence group. Whenever an item is moved in this manner, recalcu- 

factor associated with that user's rating of the item. late the centroids for the affected groups. Iterate until the 
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distance between all group centroids and items assigned to interested and an item's concept mask identifies a set of 

each group are below a predetermined threshold or until a concepts to which the item belongs. 

certain number of iterations have been accomplished The user's concept mask is stored in addition to the 

Groups, or concepts, may be deduced from item item-rating n-tuples described above. For simplicity, when- 

information, as described above, or the system may define a 5 ever reference is made to a "user profile," it should be 

set of concepts based on a predetermined standard. For understood to refer to rating-item n-tuples as well as concept 

example, a system providing movie recommendations may information. Referring once again to FIG. 1, user profiles are 

elect to use a set of concepts which correspond to established accessed in order to calculate a similarity factor for each user 

movie genres. Concepts may be used to improve the rec- with respect to all users (step 104). In a system employing 

ommendation accuracy of the system in the manner lo concepts, or grouping of items, within a domain, similarity 

described below. factors between users can be provided on a per-concept, i.e. 

Each item in the domain has at least one, and perhaps per-group, basis. That is, a similarity factor between two 

many, concepts with which it is associated. For example, a users consists of a vector of enUies, each entry representing 

movie may be associated with both a "romantic" concept a similarity factor between those two users for a group of 

and a "comedy" concept. Itcnas can be associated with 15 items, or concepts, in which they both have an interest For 

concepts by an item-to-concept map, which consists of a list example, two users having five concepts in each of their 

of all the concepts, each of which is associated with a list of concept masks would have a similarity factor with respect to 

items that belong to that concept. In some embodiments it the other user that would have five values, one for each 

may be desirable to place an upper limit on the number of concept. If one of the two users had a concept in his or her 

concepts with which an item may be associated. 20 concept mask that the other user did not, then no similarity 

Each user of the system has a number of interests that is factor for that concept could be calculated for those two 

represented by a "concept mask." A concept mask can be users. The per-concept similarity factors may be calculated 

generated by examining the user's profile, i.e. the items the using any of the methods described earlier, except that only 

user has rated and the ratings the user has given to those items which belong to the concept for which the similarity 

items. A user's concept mask can be implemented as any 25 factor is generated will be used. 

data object that associates the user with one or more As above, similarity factors between users may be rccal- 

concepts, such as an array or a linked list. Since each item culated when new ratings are received for items, 

is associated with one or more concepts, each rating given periodically, or in response to some other stimulus, 

to an item by a user indicates some interest in the concepts Similarly, any of the methods described above to reduce 

with which that item is associated. A user's concept mask 30 computational load while calculating similarity factors may 

can be generated by taking into account the items that the also be advantageously used in these embodiments. If a 

user has rated arid the ratings given to the items. similarity factor calculated between two users for a specific 

In one embodiment, each rating given to an item by the concept is negative, then it may be ignored. The similarity 

user increases the value of any concept associated with factor could be explicitly set to zero, i.e. "zeroed out," or the 

which the rated item is associated, i.e- the value for any 35 similarity factor could simplify be ignored, i.e. it could be 

concept is the sum of ratings given by the user to individual assigned a weight of zero. Assigning a negative similarity 

items which belong to the concept For example, a user rates factor a weight of zero, instead of explicitly setting it to zero, 

two items. The first item is associated with concepts A, B, would allow the similarity factor to be used in special cases, 

and C and the user has assigned a rating of "3" to this item. such as the case of warning the user away from certain items. 

The second item is associated with concepts B, C, and D and 40 Weights associated with concepts in a user's concept mask 

the user has assigned a rating of "7" to this item. The list of may be used to weight individual concept similarity factors 

concepts firom which the user's concept ma^ could be in the similarity factor vector. 

generated would include A, B, C, and D, and concept A Once similarity factor vectors have been calculated, a set 
would be assigned a vahie of three, concept B would be of neighboring users must be selected (step 106). The set of 
assigned a value of ten, concept C would be assigned a value 45 neighboring users is selected using any mediod which takes 
of ten, and concept D would be assigned a value of seven. into account the similarity fector vectors. A user's neigh- 
In some embodiments these values may be treated as boring user set may be populated responsive to the amount 
weights which signify the importance a user assigns to a of overlap between two users' concept masks, the number of 
concept, i.e. the degree of interest the user has in a particular items which they have rated similariy in any concept they . 
concept. The actual method of generating user concept 50 have in common, or both. For example, neighbors may be 
masks will vary depending on the application, the domain, selected by smnming the individual entries in the simUarity 
or the number of features present in the system. In general, factor vector calculated for each user. The user's having the 
any method of generating concept masks that lakes into greatest total could form the user's neighbor set. In ^neral, 
account, in a meaningful way, the rating;s assigned to items any method for selecting neighbors that uses the similarity 
by the user will generate an acceptable concept mask. 55 factor vector information in some meaningful way will 
A user's concept mask may include every concept with result in an appropriate selection of neighbors, and whatever 
which items rated by the user arc associated, or only the method is used may be adjusted fiom time to time to 
highest valued concepts may be used. Using the example increase recommendation acctu-acy. 
above, the user's concept mask may inchide concepts A, B, Additionally, users may be placed in the rating users 
C, and D, or it may only include concepts B and C, since 60 neighbor set based on considerations other than the similar- 
they were the hi^est valued concepts. Alternatively, a ity factor vector between the users. Alternatively, certain 
predeternuned upper limit can be set on the number of users may not be included in a neighbor set because their 
concepts in which a user may have an interest in order to contribution to the set is cmnulative. For example, if a user's 
simpHfy the domain space. The actual method for selecting neighbor set already includes two users that have a high 
concepts for the user concept mask will vary depending on 65 degree of concept overiap with respect to three concepts, but 
the application and the domain. Succinctly, a user's concept no concept overlap with reject to a fourth concept, it would 
mask identifies a set of concepts in which the user is be desirable to include a user in the neighboring user set 
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which has a concept overlap with respect to the fourth particular item or set of items highly. In this embodimeni, if 

concept rather than another user that has a higb-degree of the communication is to be targeted at users that have rated 

concept overlap with the first, second, or third concepts. a particular item highly, then the profile for that item is 

Onoe the set of neighboring useis is chosen, a weight is retrieved from memory and users which have rated the item 

X m»v t%^an?H (^«P 5 highly are determined. The determination of users that have 

lUS;. Weights may be assigned responsive to the amount of th^ w;^uur • u r 

concept overlap between Ihe useiC the amount of rating f'f 1?' ' ^^^ h ^ ^ any number of ways 

simOarity bet^Jreen the users for items in overlapping ^^^P^^' ' ^reshold value may be set and users which 

concepts, or both. For example, in the example above iLrs S*^^^,^^**'^^ ^""'l^^ i« ^^^^^s of that threshold 

were selected as neighbors based on the sum of their ""^.^f ''°''!^ ^}^^^^ ^ ^'^^ 
similarity factor vector entries; these totals could be nor- ^° Alternatively, if the communication is to be targeted at 

malized to produce a weight for each neighboring user, i.e. ^^^\ ^^"^^ ^^^^ ^ ^^^^ highly, then each profile 

the user having the highest total would be given a weight of ^^^^ is to be considered can be retrieved from 

one, the next highest user would have weight slightly less ^® memory element and a composite rating of items may be 

than one, etc. Users that are placed into a neighbor set on the produced for each user. The composite rating may be a 

basis of experience in a particxdar grouping of items, i.e. weighted average of the individual ratings given to the items 

concept, may have an appropriate weight selected for them. t)y a user; each item may be weighted equally with all the 

Recommendations may be generated for all items in a other items or a predetermined weight may be assigned to 

domain, or only for a particular group of items. Recommen- each individual item. In this embodiment, once a composite 

dations for items within a particular group or concept of rating for each user has been determined, then targeted users 
itemsareaccomplishedinthesame way as described above, 20 are selected. This selection may be done by setting a 

the inain difference being that only ratings assigned to items predetermined threshold which, when a user's composite 

within the group by users in the neighboring user set will be rating is in excess of, indicates that user is a targeted user 

used to calculate the similarity factor. in either embodiment, once targeted users are selected. 

For embodiments in which recommendations will be the communication is displayed on that user's screen when- 
made for any item in the domain, the system performs an 25 ever the user accesses the system. In other embodiments the 

mtcrsection of the set of items rated by all of the neighboring communication may be a facsimile message, an electronic 

users with the set of items that belong to the concepts mail message, or an audio message, 

included in the concept mask of the user for which the In a second embodiment, the communication which is to 

recommendation wfll be generated. Once the intersection set be targeted to selected users may seek out its own receptive 
has been generated, an item or items to be recommended is 30 users based on information stored in the user profiles and 

selected from the set, taking into account the ratings given ratings given to the communication by users of the system, 

to the item by the neighboring users, the weights assigned to In this embodiment, the communication initiaUy selects a set 

the neighboring users, and any additional information that of users to which it presents itself. The initial selection of 

may be included. For a particular item, only the user's users may be done randomly, or the communication may be 
neighboring users that have rated the item are taken into 35 "preseeded'' with a user profile which is its initial target, 

account, although if only a anall number of neighboring Once a communication presents itself to a user or set of 

users have rated the item, this information may be used to users, it requests a rating from that user or users. Users may 

"discount" the recommendation score generated. Similady, then assign a rating to the communication in any of the ways 

any weighting assigned to particular concepts present in the described above. Once a communication receives a rating or 
user's concept mask or any additional information or con- 40 ratings from users, the communication determines a new set 

fidence factors associated with the similarity factor vectors of users to which it presents itself based on the received 

may also be used to discount any recommendation score rating. One way the communication does this is to choose 

generated. The number of items to recommend may be the neighbors of users that have rated it highly. In another 

determined using any of the methods described above. embodiment, the communication analyzes the ratings it has 
As described above, the user may request that the system 45 received to determine the ideal user profile for a hypothetical 

predict a rating for a selected item. The rating is predicted by user in the second set of users to which it will present itself, 

taking a weighted average of the rating given to that item by The communication does this by retrieving from memory 

the users in the neighboring user set, and concept mask the user profiles of each user that has given it a rating. The 

techniques just described may be used in addition to the communication then analyzes those user profiles to deter- 

method described above to further refine the predicted so mine characteristics associated with users that have given it 

rating. a favorable rating. 

Whether or not grouping is used, a user or set or users may The conununication may assume that it can infer more 

be recommended to a user as having similar taste in items of from looking at items that users have rated favorably or it 

a certain group. In this case, the similarity factors calculated may instead attempt to gather information based on items 

from the user profiles and item profiles are used to match 55 that those users have rated unfavorably. Alternatively, some 

similar users and introduce them to each other. This is done selection of items in a group may be used to determine 

by recommending one user to another in much the same way characteristics of favorable user profiles. In this 

that an item is recommended to a user. It is possible to embodiment, the communication may perform a similarity 

mcrease the recommendation certainty by including the factorcalculationusingany of the methods described above, 

nunaber of items rated by both users in addition to the 60 The set of neighboring users is the set of users to which the 

similarity factors calculated for the users. communication will present itself. 

The user profiles and, if provided, item profiles may be Once the communication has presented itself to the sec- 
used to allow communication to be targeted to specific users ond set of users, the series of steps repeats with the new 
that will be most receptive to the communication. This may usees rating the communication and the communication 
be done in at least two ways. es using that infomiation to fiirther refine its ideal user to which 
In a first embodiment, a communication is provided it will present itself. In some embodiments, a limit may be 
which is intended to be deHvered to users that have rated a placed the number of users to which a communication may 
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preseot itself in the form of tokens which the communication 
spends to present itself to a user, perform a similarity factor 
calculation, or other activities on the system. For example, 
a communication may begin with a certain number of 
tokens. For each user that it presents itself to, the commu- 
nication must spend a token. The commimication may be 
rewarded for users who rate it highly by receiving more 
tokens from the system than it had to pay to present itself to 
that user. Also, a communication may be penalized for 
presenting itself to users who give it a low rating. This 
penalty may take the form of a required payment of addi- 
tional tokens or the commimication may simplify not receive 
tokens for the poor rating given to it by the user. Once the 
communication is out of tokens, it is no longer active on the 
system. 

Grouping, or subdividing the domain into concepts, as 
described above, is a special case of "feature-guided auto- 
mated collaborative filtering^ when there is only a limited 20 
number of features of interest. The method of the present 
invention works equally well for item domains in which the 
items have many features of interest, such as World Wide 
Web pagesw 

The method using feature-guided automated collaborative 
filtering incorporates feature values associated with items in 
the domain. The term "feature value" is used to describe any 
information stored about a particular feature of the item. For 
example, a feature may have boolean feature values indi- 30 
eating whether or not a particular feature exists or does not 
exist in a particular item. 

Alternatively, features may have nimierous values, such 
as terms appearing as "keywords'' in a document In some 
embodiments, each feature value can be represented by a 
vector in some metric space, where each term of the vector 
corresponds to the mean score given by a user to items 
having the feature value. 

Ideally, it is desirable to calculate a vector of distances ^ 
between every pair of users, one for each possible feature 
value defined for an item. This may not be possible if the 
number of possible feature values is very large, i.e., key- 
words in a document, or the distribution of feanire values is 
extremely sparse. Thus, in many applications, it is desirable 
to cluster feature values. The terms "cluster" and "feature 
value cluster" are used to indicate both individual feature 
values as well as feature value dusters, even though feature 
values may not necessarily be clustered. 

Feature value clusters are created by defining a distance 
function A, defined for any two points in the vector space, as 
well as vector combination function Q, which combines any 
two vectors in the space to produce a third point in the space 55 
that in some way represents the average of the points. 
Although not limited to the examples presented, three pos- 
sible formulations of A and Q are presented below. 

The notion of similarity between any two feature values 
is how similarly they have been rated by the same user, 
across the whole spectrum of users and items. One method 
of defining the similarity between any two feature values is 

to take a simple average. Thus, we define the value 'v/^ to 
be the mean of the rating given to each item containing 65 
feature value FV/ a that user i has rated. Expressed 
mathematically: 



20 



Undefined 



otherwise 



Where Tp°^ indicates the presence or absence of feamre 
value FV^° in item p. Any distance metric may be used to 
determine the per-user dimension squared distance between 
vectors feature value and feature value coc for user i. For 
example, any of the methods referred to abofvc for calculat- 
ing user similarity may be used. 

Defining S as the per-user dimension sqiiar^'^liM 
between two feature vahics, the total distance between the 
two fisature value vectors is expressed mathematically as: 



where, the term 

WsersiH 



/"I 



35 



represents adjustment for missing data. 

The combination function for the two vectors, which 
represents a kind of average for the two vectors, is expressed 
mathematically by the following three equations. 



45 FVf)- 



if = I and rjl^ = 
v,"* if rj^' = 1 and 1;^^ = 0 
v^^if rf'=Oandi;"J' = l 



50 



wherein indicates whether v"^"^ is defined. 

Another method for calculating the similarity between 
any two feature values is to assume the number of values 

used to compute "v^"- is sufficiently large. If this assumption 
is made, the Central Limit Theorem can be used to justify 
approximating the distribution of vectors by a Gaussian 
distribution. 

Since the Gaussian distribution can be effectively char- 
acterized by its mean, variance and sample size, each entry 
yf is now a triplet. 
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is small. A more accurate estimator of the population vari- 
ance is given by the term 



is the sample mean of the population. 



Dftmjll 



10 



15 



is the variance of the sainpling distnbution, and 



and represents the sample variance, which is an accurate 
estimator of the underlying population variance. 
Accordingly operator t]/^ is redefined as: 



1 if[g c,,/r;'|>l 

0 Otherwise 



20 and the triplet is defined as: 



is the sample size. 

The total distance between the two feature value vectors ^ 
is expressed mathematically by: 



Given the above, the sample variance is represented as: 

2 ii^i4>-i^f^ci,^^^^') 



\\Users\\ 



(lltwil > 
2^ 



30 



The sample variance and the variance of the sample 
distribution for a finite population arc related by the follow- 
ing relationship: 



The feature value combination function combines the 
corresponding triplets fi-om the two vectors by treating them 
as gaussians, and therefore is represented mathematically 
by: 



which transforms the standard deviation into: 



. erf \ wr-) if = 1 and 77"^ = 0 



40 



where 



represents the mean of the new population. 



45 



50 



Thus, the feature value vector combination function is 
defined as: 



(p:^,5r-^wr^)if^?-=iand^^=i 

</i?\ Sf\ Nf^) if 7?' = 1 and tu^ = 0 
(;i?.5r'.^?) ifrf'=Oand;,? = l 



represents the variance of the combined population, and 
represents the sample size of the combined population. 



Regardless of the feature value combination function 
55 used, the item similarity meuics generated by them are used 
to generate feature value clusters. Feature value clusters are 
generated from the item similarity metrics using any clus- 
tering algorithm known in the art. For example, the method 
described above with respect to grouping items could be 
60 used to group values within each feature. 

Feature values can be clustered both periodically and 
incrementally. Incremental clustering is necessary when the 
number of feature values for items is so large that reclus- 
tering of all feature values cannot be done conveniently. 
The third method of calculating feature value simUarity 65 However, incremental clustering may be used for any set of 
metrics attempts to take into account the variance of the items, and it is preferred to use both periodic rcclustering 
sampling distribution when the sample size of the population and incremental rcclustering. 
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All feature values are periodically reclustered using any 
clustering method known in the art, such as K-means. It is 
preferred that this is done infrequently, becaiise of the time 
that may be required to complete such a reclustering. In 
order to cluster new feature values present in items new to 
the domain, feamre values are incrementally clustered. New 
feature values present in the new items are clustered into the 
already existing feature value clusters. These feature values 
may or may not be reclustered into another feature value 
cluster when the next complete reclustering is done. 

Using the feature value clusters generated by any one of 
the methods described above, a method for recommending 
an item, as shown in FIG. 3, uses feature clusters to aid in 
predicting ratings and proceeds as the method of FIG. 1, in 
that a plurality of user profiles is stored (step 102'). As 
above, a plurality of item profiles may also be stored. The 
method using feature value clusters assigns a weight to each 
feature value cluster and a weight to each feature based on 
the users rating of the item (steps 120 and 122), 

A feature value cluster weight for each cluster is calcu- 
lated for each user based on the user's ratings of items 
containing that cluster. The chister weight is an indication of 
how important a particular user seems to find a particular 
feature value cluster. For example, a feature for an item in 
a music domain might be the identity of the producer. If a 
user rated highly all items having a particular producer (or 
cluster of producers), then the user appears to place great 
emphasis on that particular producer (featiire value) or 
cluster of producers (feature value cluster). 

Any method of assigning feature value cluster weight that 
takes into account the user's rating of the item and the 
existence of the feature value cliister for that item is 
sufficient, however, it is currently preferred to assign feature 
value cluster weights by summing all of the item ratings that 
a user has entered and dividing by the number of feature 
value cliisters. Expressed mathematically, the vector weight 
for chister a of feature a for user I is: 



larily between two users, as described above, may be used 
provided they are augmented by the feature weights and 
feature value weights. Thus 



||Fa»imD<fnn/l| 
D,j = ^ FW, 



Don _^ 



jQ represents the similarity between users I and J, where 

j5 is a boolean operator on a vector of values indicating 
whether feature value cluster of x for feature a of the vector 
is defined and where 
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cw. = 



0.0 



iff ^ ^^Kp^yt"^^ 



otherwise 



40 



45 



where y^"* is a boolean operator indicating v^liether item p 
contains the feature vahie cluster x of feature a. 

The feature value cluster weight is used, in turn, to define 
a feature weight. The feature weight reflects the importance 
of that feature relative to the other features for a particular 
feature. Any method of estimating a feature weight can be 
used; for example, feature weights may be defined as the 
reciprocal of the number of features defined for all items. It 
is preferred that feature weights arc defined as the standard 
deviation of all cluster weight divided by the means of all 
cluster weights. Expressed mathematically: 



Du = cw, 



20 



0.0 



Sterna 



otherwise 



The representation of an item as a set of feature values 
allows the application of various feature-based similarity 
metrics between items. Two items may not share any iden- 
tical feature values but still be considered quite similar to 
each other if they share some feature value clusters. IMs 
allows the recommendation of imrated items to a user based 
on the unrated items similarity to other items which the user 
has already rated highly. 

The similarity between two items p^ and P^, where and 
P2 represent the corresponding sets of feature values pos- 
sessed by these items, can be represented as some function, 
f, of the following three sets: the number of common feature 
values shared by the two items; the number of feature values 
that pj possesses that p2 does not; and the number of feature 
values that p^ possesses that p^ does not. 

Thus, the similarity between two items, denoted by S(p^, 
P2), is represented as: 

Each item is treated as a vector of feature value clusters 
and the item-item similarity metrics are defined as: 

m n 2j ^ Z [cw'xyf, xrS) 



mFtatwraPtSotdi 



FH/,x^(cw/xy^^x{l->^)) 



FW, = 



StarulardDevicW^,^ 



60 



Fiy,x2(a//x{i-r;-)x7^) 
0=1 "^"^ 



The feature value cluster weights and the feature weights 
are used to calculate the similarity factor between two users. 
The similarity factor between two users may be calculated 
by any method that takes into account the assigned weights. 
For example, any of the methods for calculating the simi- 



This metric is personalized to each user since the feature 
weights and cluster weights reflect the relative importance of 
65 a particular feature value to a user. 

Another method of defining item-item similarity metrics 
attempts to take into account the case where one pair of 
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items has numerous identical feature values, because if two Means 16 for selecting is also provided to select neigh- 
items share a number of identical feature values, they are boring users responsive to the similarity factors. Again, 
more similar to each other than two items that do not share specialized hardware or a microprocessor may be provided 
feature values. Using this method, fCPmPJ is defined as: to implement the selecting means 16, however preferred is 

5 to provide a software program running on a microprocessor 

/Ci n Pi) = resident in a general-purpose computer. Selecting means 16 

may be a separate microprocessor from calculating means 

DF««^Defiae,a ^ ^ 24 Qf it may be the same microprocessor. 

2^ fw, X icwf' xy^i xyg)+ 2 (t;^ xrg) a means 18 for assigning a weight to each of the neigh- 

«-» ^'''"^ i boring users is provided and can be specialized hardware, a 

separate microprocessor, the same microprocessor as calcu- 
Another method for calculating item-item similarity is to If^^ ^^-^ l^and selecting means 16. or a microprocessor 
treat each item as a vector of feamre value clusters and then J^^jJ*"' ' general-purpose computer and runnmg soft- 
compute the weighted dot product of the two vectors. Tlus. '^^^ embodiments a receiving means is included in the 
S(pijf^g(?^n?^ 15 apparatus (not shown in FIG. 4). Receiving means is any 
where device which receives ratings for items from users. The 

. , , . receiving means may be a teyboard or mouse c^ 

iFtam^^ ^ personal computer. In some embodiments, an electronic 

^ "1 20 area network forms the receivmg means. In the preferred 

embodiment, a World Wide Web Page connected to the 
Internet forms the receiving means. 

In another aspect, the system and method may be used to Also included in the apparatus is means 20 for recom- 

identify users that will enjoy a particular item. In this aspect, mending at least one of the items to the users based on the 

as above, user profiles and item profiles are stored in a 25 weights assigned to the users, neighboring users and the 

memory element, and the user profiles and item profiles ratings given to the item by the users* neighboring users, 

record ratmgs given to items by users. An item profile Recommendation means 20 may be specialized hardware, a 

contains at least an identification of a user and the rating microprocessor, or, as above, a microprocessor running 

given to that item by that user. The item profile may contain software and resident on a general-purpose computer. Rec- 

additional information just as described in connection with 30 ommendation means 20 may also comprise an output device 

user profiles. Sunilanty factors between items are calculated such as a display, audio output, or printed output 

using any of the methods described above. For example, in another embodiment an apparatus for recommending 

usmg the squared difference method for calculating similar- an item is provided that uses feature weights and feature 

ity factors, the rating given to a first item by User A and the value weights. This apparatus is similar to the one described 

ratmg ^ven to a second item by User A are subtracted and 35 above except that it also includes a means for assigning a 

that difference is squared. This is done for each user that has feature value cluster weight 22 and a means for assigning a 

rated both items. The squared differences are then summed feature weight 24 (not shown in FIG. 4). Feature value 

and divided by the total number of users that have rated both duster weight assigning means 22 and feamre value weight 

. , assigning means 24 may be provided as specialized 

This provides an item-item sunilanty metric and a group 40 hardware, a separate microprocessor, the same microproces- 

of neighbormg items is selected in the same way as sor as the other means, or as a single microprocessor in a 

described above. Those neighboring items are then weighted general purpose computer. 

and a user, or group of users, that wiU be receptive to a given FIG. 5 shows the Internet system on which an embodi- 

item are determmed. Agam, this may be done using any of ment of the method and apparatus may be used. The server 

Uie methods described above, mcluding using confidence 4S 40 is an apparatus as shown in FIG. 4, and it is preferred that 

factors Item grouping, or feature guided automated coUabo- server 40 displays a World Wide Web Page when accessed 

rative filtering. ^ internet 42. Server 40 also accepts input over 

The methods descnbcd above can be provided as software the Internet 42. Multiple users 44 may access server 40 

on any suitable medium that is readable by a computing simultaneously. In other embodiments, the system may be a 

device. The software programs means may be implemented so stand-alone device, e.g. a kiosk, which a user physically 

m any smtable language such as, C, C++, PERL, USP, approaches and with which the user interacts. Alternatively, 

ADA, assembly lar^age or machine code. The suitable the system may operate on an organization's internal web, 

media may be any device capable of storing program means commonly known as an Intranet, or it may operate via a 

m a computer-readable fashion, such as a floppy disk, a hard wireless network, such as sateUite broadcast, 
disk, an optical didc, a CD-ROM, a magnetic tape, a memory 55 

card, or a removable magnetic drive. EXAMPLE 1 

An apparatus may be provided to recommend items to a The following example is one way of using the invention, 

user The apparatus, as shown in FIG. 4 has a memory which can be used to recommend items in various domains 

element 12 for storing user and item profiles. Memory for many items. By way of example, a new user 44 accesses 

element 12 can be any memory element capable of storing 60 the system via the World Wide Web. The system displays a 

the profiles such as, RAM, EPROM, or magnetic media. welcome page, which allows the user 44 to create an alias to 

A means 14 for calculating is provided which calculates use when accessing the system. Once the user 44 has entered 

the similarity factors between users. Calculating means 14 a personal alias, the user 44 is asked to rate a number of 

may be specialized hardware to do the calculation or, items, in this example the items to be rated arc recording 

alternatively, calculating means 14 may be a microprocessor 65 artists in the music domain. 

or software running on a microprocessor resident in a After the user 44 has submitted ratings for various record- 
general-purpose computer. ing artists, the system allows the user 44 to enter ratings for 
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additional artists or to request recommendations. If the user or the kiosk may aUow the user to purchase the item direcdv 
44 desires to enter raUngs for additional artists, the system by interacting with the kiosk 
can provide a list of artists the user 44 has not yet rated. For i u 

the example, the system can simplify provide a random ^.""^ embodiments, the system may include multiple 

hsting of artists not yet rated by the user 44. Altemativelv s ^^.^'^^ ^ descnbed in Example 1 interconnected with 
the user 44 can request to rate artists that are similar to ^"^P « WaraU as described in Example 2. For example, a 
recording artists they have already rated, and the system wiU • f ^"^^ provided which allows users to rate 

provide a list of similar artists using the item simUarity "^'u ^!^T^^a.rtists, and songs, and a second 

values previously calculated by the system The user can ""^^^ provided which aUows users to rate books, 

also request to rate recording artists from a particular group in "^'^azm^, short stones, and other literary works. Tbe two 
e.g. modem jazz, rock, or big band, and the system whI ^^^T ^^^^ mterconnecfed and, in addition, 

provide the user 44 with a list of artists belonging to that , connected to one or more kiosks as described in 

group that the user 44 has not yet rated. The user 44can also *^^P^*^ 2, such as a kiosk provided by a bookstore which 
request to rate more artists that the user's 44 neighboring u "^"^ "f*"- ^''''^ ^ magazines, and 

users have rated, and the system wiU provide the^iser 4 i, „ ^^""^^^ ^ provided by a record store which 
with a list of artists by selecting artists rated by the user's 44 ^^^ws users to rate albums, artists, and other items of 
neighboring users. ' ^ y 5^ musical mterest 

' The user 44 can request the system to make artist recom- xij^^^f^^ '° embodiment of the system shown in 
mendations at any time, and the system allows the user 44 ' ^ generally desired that a user need only initially 

to tailor their request based on a number of different factors 20 ^^^^^ * ^^^^^ ^ possible entry points, e.g. the 

Thus, the system can recommend artists from various groups Web site 62, the literary Web site 64, the music kiosk 

that the user's 44 neighboring users have also rated highly ^® literary kiosk 68. If a user has created a profile in 

Similarly, the system can recommend a predetermined num- subject area, ratings given by the user to items in that 

ber of artists from a particular group that the user will enjoy ^^^i^^ area may be used to recommend items in a different 
e.g. opera singers. Alternatively, the system may combine 25 ^^^^""^ example, if a user has logged into the Uterary Web 
these approaches and recommend only opera singers that the provided a number of ratings for books, it would 

user's neighboring users have rated highly ^^^^ literary ratings when the user logs 

The system aUows the user 44 to switch between rating T^"^ '^"^^ """^'^ recommendations, 

items and receiving recommendations many times. The centralized server 70 may be provided which acts as a 
system also provides a messaging function, so that users 44 30 "^""^^ ^Pository for user profile data, 
may leave messages for other users that are not currenUy . ^° ^^^^ systems, the constinient parts, e.g. the music Web 
using the system. The system provides "chat rooms," which ^.^^ literary Web site 64, the music kiosk 66, and the 

allow users 44 to engage in conversation with other users 44' literary kiosk 68 arc generally interconnected by traditional 
that are cunrently accessing the system. These features are network media and, most probably, arc connected 

provided to allow users 44 to communicate with one another. 35 telephone lines. A rctail cstabhshment may use more than 
The system facilitates user communication by informing a ^ service customers. In this case, the 

user 44 that another user 44' shares an interest in a particular ^ ^^^^ devices, i.e. front-ends, 

recording artist. Also, the system may inform a user 44 that connected to a kiosk server 80. The kiosk server 80 is 
another user 44 that shares an interest in a particular record- <^nnected to the wide-area network and transfers informa- 
ing artists is currenUy accessing the system, the system will 40 central server 70. Because such wide area 

not only inform the user 44, but will encourage the user 44 network rnedia is subject to a number of environmental 
to contact the other user 44' that shares the interest The user factors which may disrupt transmission between two inter- 
44 may leave the system by logging off of the Web Page. connected points, e.g. Web site 64 and central server 70, a 

mcchanian for providing distributed user management must 
^^^^^^^'^ ^ 45 be employed allowing users to create profiles at any entry 

In another example, the system is provided as a stand- poirit in the system and access those profiles from any other 
alone kiosk which is used by shoppers in a retail establish- Poi°l ir» the system in a highly available manner, 
mem. The kiosk has an output device such as a display A global user name space is provided by assigning each 
screen or printer, and possible an input device, such as a user a multiple byte identification code. When a user logs on 
touch screen or keyboard. The kiosk has a memory element 50 to a site or a kiosk and indicates that he or she desires to 
which allows it to store profiles for items and users. In come create a new user profile (step 702), the user is prompted to 
cases, the kiosk may be provided with a CD-ROM drive for enter at least an aHas and password (step 704). The user may 
allowing "preseeded" user and item profiles to be toaded also be prompted to enter certain demographic information, 
mto the kiosk. The node must verify that the alias supplied by the user is not 

In this example, a user may approach a kiosk to detennine 55 already in use (step 704). One method for verifying the alias 
an item which is recommended for them. The user would is described below in connection FIG. 8. 
input their alias from the system of EXAMPLE 1, and the If the aHas suppUed by the user is not already in use then 
kiosk could access the CD-ROM in order to load the user's the node verifies whatever demographic data the user sup- 
profile into memory. The kiosk may also load similarity pUed (step 708). In embodiments where the user is not 
factors which have been calculated before hand or the kiosk 60 prompted to supply any demographic dau this step may 
may calculate the similarity factors now. The kiosk can then skipped 

'^^^l^L'^l?^'^'^ '^^^^^^ "^T I °^ Demographic data is checked for validity in any one of a 

recommended Items which njay be printed out for the user, number of ways. For example, user supplied deiiographic 

t^^Mh™ h'^'°^-'^!i^^^^ ^^'^ "^'y «>™P^^ to demographic data suppHed by 

the user through an audio device. ,5 similar users that have already re^tered to detemi^e if any 

The fa(»k may also provide the user with directions for values given by the user are outside of ranges given by 

how to find recommended items in the retail establishment, simHar users. Alternatively, demographic data may be 
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scanned for entnes that supply conflicting information, for creation process. Regardless of how the process is begun, 

example, a demographic daU profile stating that the user's the user selects an aUas and indicates his or her selection to 

age is under 18 and also that the user is employed as a doctor the Web site or kiosk. The alias selection may be indicated 

may be determined invalid. If the demographic data sujqilied to the node by typing the alias on a keyboard, by providing 

by the user is not determined to be valid, the user is 5 »n »Kas on a magnetically-striped card which is swiped to 

prompted to reenter certain information (step 704). The user '^^ "^as from the card, or the alias may be entered by ' 

may be prompted to reenter the offending demographic voice recognition. Once the node has received the alias 

value, or the user may be requited to select a new alias, a information from the user it requests that the user select a 

new password, and prompted for new demographic data. password whit* prevents the user's profile from being used 

If the demographic data supplied by the user is deter- lo ""yone other than that user. AdditionaUy, the node may 

mined to be valid, the node creates a local identified code for "^f^^ from the user such as demographic 

the user (step 712). Each node is assigned a unique identi- >oto™a'>on 

fication code. The identification code assigned to each node ' depicts one embodiment of the steps taken by a 

is combined with the four byte identification code assigned '° "^"^ that an alias is not use (step 706). Once the 

to the user by the node to provide the user with a globally 15 n^s acquired aU the desired information from the user, 

unique identification code verify that the alias selected by the new user is not 

In,one,embodiment..usen. .are. assigned.an 8 byte user f ^ one embodiment, the node determines if there is 

identification c»de which consists of 4 bytes identifying the TTk »°/f-'de"hficatoon -code^ored^ -- 

node on which the user created his or ber initial profile and ^'"1^1.' ^ f h T.k^''? ^ 

the other 4 bytes of which indicate the user's selected alias ^° '^''^y^^" selected and the alias is mvahd (step 804). The 

It wouU be clear to one of ordinary skill in thi art toat^ !Z'Lmbin'L°Z^,h'^ ""'T H T« ""T 

number of bytes used to uniquely identify users could be ,^ l^^H^T^r h °°de identificauon code .s present 

enlarged in order to accommodate large? populations of 1*^! "1° ' datab^ then the alias is vahd and the 

users or of sites and kiosfc. Simflarly, itlould be clear that ^X '^.T"^ described in connection HG. 7 contin- 

the bytes in the identification code ciuld be unevenly spUt « ^,.,^1"^^^" ""i"^ '^^"^ "^f °""f"^ 

between users and sites, i.e. in the example above the Ser ^ Se^!,. T^^' ft' ""f "^t"" has selected the 

identification code could require 5 bytes and the node ^^^^.^^eci^u^theidenti^of Aenodeonwb^ 

identification code could re^L only 3 bytes. IZ^,^ ^"^^T^^ "«','d"'>fi«t'°° 

p. . ^ ... .. ... . ' . . . , codes. Therefore, as long as the node has stored in a memory 

f,n™?i,^^ w.k!^* ^'^^T^t^'J' «° f 30 element a list or table of every user aUas associated with 

•H^^ • K ^ ' '"^ ^ P«>file« ^e^te** ^ "Ode. and as long as no alias conflict 

provided with appropriate memory elemente to cache user exists with the aliases stored in that local list or table, then 

[Sh^h .f'.h i'l Sf'.:'"? ^ uniqueness of the identification code which will be 

transmitted to the central server 70. Each node pcnodicaUy assigned to the user based on the alias chosen is guaranteed. 

attempts to connect to the central server 70 to transmit any ift^^^A^^^*^ '^^*u i- u • ^ 

registration data which it has collected (step 714) In the If^enodtedetetimnes that the alias «.n be assigned to the 

»h,. »K. „.h„^j, tJ^ J j.u , user, the node makes the assignment and prepares to transmit 

LTer%is^ot ^bt^^e^^^^ ""^ T: """'"i theregistrationinformationrthecentral^se,^er70.Asnoted 

r^nH If ?. ^ ^ ^ predetennmed ^bove, the node will attempt to transmit the registration 

LmJ 70 LTm ' information over the network (step 714), howf ver, the 

™. L ■ , 40 network may be busy or the network may be nonfunctional 

This capability may be implemented as a daemon, i.e. a due to environmental and physical factors, such as downed 

background process which executes in a continuous loop lines. If the networic between the node and the central server 

and attempts to transmit registration data to the central 70 is. in fact, nonfunctional then the node will store the 

server 70. In another embodunenl, this capability may be registration information data locally in a memory element 

provided as a thread of execution which continually tries to and wiU continue to attempt to transmit the user registration 

send registration data to the central server 70. information periodically. While the network is dovm and the 

As long as the network between the node and server 70 is node is attempting to transmit cached user registration 

functional, the node transmits registration data to the central information data, other users may create user profiles and 

server 70 using any one of a number of wide area network users may utihze the node to obtain recoimnendations for 

protocols. For example, the node may transmit registration 50 items and to provide ratings for items, 

information data to the central server 70 using the LDAP In one embodiment, if the user provides ratings for items 

protocol. As the node transnuts registration information data while the registration information is cached, then the node 

to tbccentral server 70, it may remove the cached informa- may cache the ratings provided by the user with the user 

Uon from its memory element However, it is preferable for registration information. In this embodiment, when the node 

the node to contmue caching registration information data 55 successfully transmits the user registration information to 

untU data must be removed from the cache because the cache the central server 70 it also transmits ratings provided by the 

is too frill. Any cache replacement mechanism currendy user. User registration information may be cached at the 

known m the art may be used to remove registration infor- node using a buffer in which data is entered when the node 

matron data from the node's cache, including least-recently- desires to transmit it to the central server 70, or data stored 

used, first-m-first-out, or random replacement. The central go by the node may be provided with a flag or bit which 

server 70 records the transmitted daU upon receipt indicates when the data should be sent to the central server 

As noted above, a user accessing the system via the World 70 and when it has already been sent to the central server 70. 

Wide Web would encounter a welcome page on a Web site Once a user has created a user profile on a node, and the 

which allows the user to aeate an alias and a user profile. A node has transmitted the user registration information to the 

user accessing the system via a kiosk may also encounter 65 central server 70, then that user is able to log in to any node 

such a welcome page, although the kiosk may provide a present on the network. For example, referring to FIG. 6, a 

button which the user presses to begin the user profile user may create his or her user profile using Web site 62 
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Once Web site 62 has transmitted the user registration to the central server 70 that data should not be transmitted 

information to the central server 70, then the user is able to and a "1" could indicate the data can be transmitted. Using 

log in to the kiosks 66, 68, 72, 72', or 72" and the user's this convention, a user having all "0" entries would not 

profile will be available to those nodes, allow any information to be transmitted to a node. While this 

When a user logs in to a node that is different from the one 5 would provide the user with a high degree of privacy, it 

on which it created its user profile that node must verify the would inhibit the nodes from making recommendations to 

user is a valid user of the entire system. The node does this the user because the nodes would be unable to access the 

by first checking its local database to determine if it can user's preference data stored on the central server 70. 

verify the user's identification code. The node's local data- Data transmitted from the central server 70 may be 
base storage may include users in addition to the ones that 10 encrypted in order to prevent a breach of the user's privacy 

created their user profile on that node. In some embodiments, the central server 70 sends the 

If the user's identification code is not stored locally then encrypted data to the node together with a key that the node 

the node transmits a user verification request to the central will need to decrypt the data. In other embodiments, the 

server 70. In the event that the network between the node central server 70 sends the encrypted data to the node and 

and the communication server 70 is nonfunctional, the user assumes that the node has the key required to decrypt the 

verification request fails and the iiser is not able to log on to data. For example, the key used to encrypt the data may be 

the node at, that time. ^ ,^ _ the, user's password. Since the node, received^ the user's^ 

If the user verification request is successfully transmitted password from the user during log in, the node will be abie 

to the central server 70, then the central server 70 determines ^ decrypt the user information. 

if the transmitted alias and password information or networic In other embodiments, encryption can be used to allow the 

id and password is valid for any user of the system. The user to control access to his or her data. For example, the 

central server 70 stores information associating user aliases user profile information may be encrypted with mult43le 

with the appropriate passwords in any manner inchiding as keys and only a node with all of the encryption keys may 

a table. If the user's alias and password matches then the access the data. For example, the central server 70 may use 

central server 70 sends a message back to the node that the the nodes identification code, assigned above, as a first 

user is verified and the user's log in is successful. In encryption key and may use the user's password as a second 

addition, the central server 70 may transmit additional encryption key. A requesting node would receive the user's 

information associated with the user, such as demographic information from the central server 70 but, unless it has both 

information. In alternative embodiments, the central server its own identification code and the user's password, the node 

70 may transmit only the user verification message and the would be unable to decrypt the user's information, 

node must request additional information from the central Additional encryption keys may be specified by the user 

server 70. If the cenU-al server 70 does not find the user's to control which nodes or which information are transmitted. 

aUas and password combination in its data store, then it For example, the user may indicate that only preference data 

transmits a verification failed message to the node and the shouH be transmitted to nodes. Apreference data encryption 

user's log m to the system is unsuccessfiil. key can be assigned by the central server 70 and used to 

A node requests user information from the central server encrypt the user's preference data. A node requesting the 

70 in refuse to a number stimuli. In the example above, user's preference data may be given the preference data 

the node requested user information in rehouse to the user's encryption key by the central server 70 at the time of the 

attempt to log in to the node. However, the node may also ^ request, or the central server 70 may transmit the preference 

request user information for its own purposes. For example, data encryption key to all nodes periodically. Transmitting 

a node may desire to send an advertisement to users having encryption key periodically allows those keys to be updated 

a particular demographic profile. The node could request to fiirther strengthen the security of the system, 

demographic data for each user logged into the node. The The acttial type of encryption used may vary depending 

central server 70 would transmit the demographic data to the on the geographic scope of the network. For example, a 

node and the node could then select one or more users to network spanning international boundaries could use the 

display an advertisement to which to display an advertise- DES encryption standard in order to provide users with a fair 

degree of privacy while complying with United States 

It is desirable to provide users with an ability to control export laws. However, other encryption standards may be 

the entities to which the central server 70 will transmit data 50 used such as pgp. 

about that user. It is furthw desirable to aUow the users to Use of encryption to secure user information data enables 

select catam types of information which should not be the formation of an information marketplace. An embodi- 

transmitted e.g., a user may wish to have preference data ments in which the node has the option of decrypting the 

transmitted but not demographic data. received user information data itself or requesting the central 

In one embodiment, the central server 70 hosts a table 5s server 70 to decrypt the user profile information, the central 

which associates users, sites, and types of information. server 70 may charge the node a fee for decrypting the data. 

Alternatively, the server may host separate tables, one of Such a fee could be based on any one of a number of factors, 

which associates users and sites and one of which associates such as the amount of information to be decrypted, the type 

users and types of information. In this embodiment, the of information, or a fee selected by the user to indicate how 

server 70 is required to access two tables to determine if data go valuable the user perceives the associated profile informa- 

may be sent to the central server 70. Regardless of whether tion. 

one table or multiple tables are used, when the server 70 In other embodiments, the central server 70 can charge for 

receive a request for user daU it queries the table to the decryption key itself. In these embodiments, a node 

determme if data should be sent. requesting user profile information would pay a fee to 

The table or tables may be populated with bytes or bits 65 receive the decryption key firom the central server 70 and can 

which act as flags enablit^ or disabling tt-ansmission of data use the decryption key to decrypt the user profile informa- 

from the central server 70, For example, a "0" may indicate tion transmitted to it by the central server 70. In these 
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embodiments, the central server 70 would likely change the (B3b) calculates, for each one of the plurality of 
decryption key periodically in order to require nodes to pay users and in response to the rating information 
on a periodic basis. The fee charged for each decryption key received from the server computer, a plurality of 
could vary as described above, and in addition, could vary similarity factors, between said each one user and 
in response to the length of time the decryption key is valid. 5 at least one other one of the users, for each of said 
For example, a decryption key which will be valid longer items, including said non-rated item; 
would support a larger fee than a decryption key that would (B3c) selects, in response to the plurality of similar- 
expire quickly. ity factors and for each one of the plurality of 
In some embodiments, user profile information is segre- users, a plurality of neighboring ones of the users, 
gated into profile secUons. A profile section represents user such that each of the neighboring ones of the users 
preference data for a particular group of items or items has an associated similarity factor which is greater 
having a particular feature. For example, a user's preference than a first predefined threshold vahie or. if a 
mformauon may be broken down into a profile section confidence factor is associated with the associated 

nt^L ' T 'w^"^"^ """^ "^^^""^ ^^^^'y f^^tor, both the associated similarity 

one relaung to cooking, one relating to restaurants, and so f^^^, ^ predefined threshold and the 

on.Anodetypically requests only the profile section which " _ f * ^ ^^lu auu luc 

is relevant to the domain in which it operates. If a fee is IT \ ^ predefined 

..charged for certain user information, or for decryption/ „ ^rjf .° 

encryption keys associated with ceruin information those ^ correspondmg weight to each of the 

fees may also be set depending on the value of a particular ncighbormg users so as to define a plurality of 

profile section. The value accorded to a particular profile 20 weights; and 

section may vary depending on many factors, including the (P^) recommends at least one of a plurality of the 

number of ratings present in a particular section, the validity ilGms to said one user in response to the plurality 

of the ratings in a particular section (i.e. the quality of the of weights and ratings given to the non-rated item 

data in that section), the number of consuming nodes in the by the neighboring ones of the users, 

marketplace for the profile section information, the number 25 ^- apparatus in claim 1 wherein the processor, in 

of users that have allowed transmission of the data contained response to the executable instructions: 

in the profile section, and others. obtains addiUonal rating information for at least one item 

Having described preferred embodiments of the firom a given one of the users; and 

invention, it will now become apparent to one of skill in the suppUes the additional rating information to the server 

art that other embodiments incorporating the concepts may 30 computer; and 

be used. It is felt^ therefore, that these embodiments should the server o^mputer updates, in response to the additional 

not bc hmitcdto dBdos^ information, the user and item profiles stored in 

hmited only by the spint and scope of the following claims. the first memory. 

What IS claimed is: 3. The apparatus in claim 2 wherein profile data, forming 

1. Oient-seryer based ^paratus for recommendmg an 35 the user and item profiles stored in the first memory, 5 

Item to one of a plurahty of users situated at a cHent organized into a plurality of profile sections and the request 

computer, the chent computer conneaed to a server ^pedfics a particular one of the profile sections for which 

computer, wherem the itein has not yet been rated by the one mating information is to be provided by the'server computer, 

user the apparatus comprising: ^^^^ associated with a class of said items 

(A) the server computer havmg a first memory associated 40 through which a recommendation could be obtained through 
therewith, wherein the server: the client computer to the one user. 

(Al) stores a user profile, in the first memory, for each 4. The apparatus in claim 3 wherein the client computer 

of a plurality of users, wherein the user profile is situated al a kiosk. 

comprises a separate rating value, supplied by a 5. The apparatus in claim 3 wherein the server computer 

particular one of the users, for each corresponding 45 further comprises means for permitting a plurality of users, 

one of a pluraUty of items, said items inchiding the each stationed at a different one of a plurality of chent 

Item non-rated by the user; computers, to share information amongst themselves related 

(A2) stores an item profile, in the first memory, for each to the items, 

of the rated items, comprises a separate rating value, 6. The apparatus in claim 3 wherein the user profile 

for a particular one of the items, provided by each 50 comprises either a set of user n-tuples or a first set of pointers 

one of the plurality of the users, wherein the user to storage locations in the memory at which user entries are 

profile and the item profile are distinct from each stored, wherein each of the user n-tuples comprises a sepa- 

otiier, and rate rating value, supplied by a particular one of the users, 

(A3) m response to a request issued by the client for each corresponding one of a plurality of items, said items 

computer, accesses rating information firom the user 55 including the item non-rated by the user and each of the user 

and item profiles stored in the first memory and entries stores a separate rating for an associated one of the 

provides the rating information to the client com- users for a corresponding one of the items, 

puter; and 7 1^ apparatus in claim 6 wherein the item profile 

(B) the chent computer comprising: comprises either a set of item o-tuples or a second set of 
(Bl) a processor; and 60 pointers to storage locations in the memory at \^iiich item 
(B2) a second memory, connected to the processor, for entries are stored, wherein each of the item n-tuples corn- 
storing computer executable insUiictions therein; and prises a separate rating value, for a particular one of the 

(B3) wherein the processor, in response to the execut- items, provided by each one of the plurality of the users and 

able instructions: each of the item entries stores a rating for an associated one 

(B3a) issues, in response to interaction with the one 5S of the items by a corresponding one of the users. 

user, the request to the server computer for the 8. The apparatus in claim 7 wherein the calculating step 

rating information; ftirther comprises the steps of; 
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receiving a rating from the given one of the users for a 
given one of the plurality of items so as to define a 
received rating; 

updating the user profile associated with the given one 
user by either writing the received rating into an 
appropriate one of the user n-tuples or writing the 
received rating into an appropriate one of the user 
entries; 

updating the item profile associated with the given one 
item by writing the received rating into an appropriate 
one of the item n-tuples or writing the received rating 
into an appropriate one of the item entries; and 

calculating, for the given one user, a plurality of said 
similarity factors, wherein each of the similarity factors 
for said given one user represents similarity between 
said given one user and another one of the users with 
respect to the items.- — - 

9. The apparatus in claim 3 wherein the item profile 
comprises either a set of item n-tuples or a second set of 
pointers to storage locations in the memory at which item 
entries are stored, wherein each of the item n-tuples com- 
prises a separate rating value, for a particular one of the 
items, provided by each one of the phirality of the users and 
each of the item entries stores a rating for an associated one 
of the items by a corresponding one of the users. 

10. The apparatus in claim 9 wherein the user profile 
comprises either a set of user n-tuples or a first set of pointers 
to storage locations in the memory at which user entries are 
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Stored, wherein each of the user n-tuples comprises a sepa- 
rate rating value, supplied by a particular one of the users, 
for each corresponding one of a plurality of items, said items 
including the item non-rated by the user and each of the user 
5 entries stores a separate rating for an associated one of the 
users for a corresponding one of the items. 

11. The apparatus in claim 10 wherein the calculating step 
further comprises the steps of: 
receiving a rating from the given one of the users for a 
given one of the plurality of items so as to define a 
received rating; 
updating the user profile associated with the given one 
user by either writing the received rating into an 
15 appropriate one of the user n-tuples or writing the 
received rating into an appropriate one of the user 
. entries; 

updating the item profile associated with the given one 
item by writing the received rating into an appropriate 
one of the item n-tuples or writing the received rating 
into an appropriate one of the item entries; and 

calculating, for the given one user, a plurality of said 
similarity factors, wherein each of the similarity factors 
25 for said given one user represents similarity between 
said given one user and another one of the users with 
respect to the items. 

***** 
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